Latent Features of Numbers Learned by Sequence Models

by Peter de Blanc + ChatGPT Deep Research
Posted to Adarie (www.adarie.com) on April 16, 2025
Content License: Creative Commons CC0 (No Rights Reserved)


Researchers have developed various ways to embed integers as distinct tokens in sequence modeling tasks (e.g. using OEIS data). In these approaches, each number is treated like a “word” with its own vector representation, allowing models to learn mathematical relationships from the contexts in which numbers appear. Below we summarize the key techniques for learning such embeddings, how the infinite vocabulary of integers is managed, and what semantic structure emerges in the learned vector spaces.

Methods for Learning Number Embeddings as Tokens

Vocabulary Size and Handling of Infinite Integers

A core challenge in treating numbers as tokens is the infinite vocabulary of integers. In practice, researchers impose limits so that only certain numbers get dedicated embeddings:

In practice, projects that embed numbers as tokens choose a cutoff that suits their data and goals. In OEIS-based experiments, including all numbers up to moderate frequency (and using UNK for the rest) has proven effective (). For other corpora, one might include, say, the top K most frequent numbers (if dealing with something like a scientific corpus where certain constants recur) and treat others as unknown or use digit decomposition. The overarching aim is to balance coverage (having embeddings for the important numbers in the domain) with generalization (not overfitting to rare ids and being able to handle new numbers reasonably).

Semantic Structure in Learned Number Embeddings

Even though these embeddings are learned without explicit labels for number properties, researchers have found that certain human-interpretable numerical concepts emerge as directions or clusters in the vector space. Probing and visualizing number embeddings has revealed axes that correspond to fundamental number properties:

In summary, when integers are embedded as individual tokens based on sequence co-occurrence, the resulting vector space is rich in latent numeric knowledge. Basic number-theoretic properties (parity, divisibility, primality) end up encoded either along single dimensions or in linear combinations of dimensions () (). Quantitative attributes like magnitude can also emerge, especially if the training data emphasizes them (). Perhaps most interestingly, the embedding space captures semantic groupings of numbers: those that belong to the same well-defined sequence or category are located near each other in the space, making it possible to retrieve, say, other primes or other square numbers by vector proximity (). This kind of structure can be exploited for tasks like sequence classification or completion. In fact, the motivation behind learning these embeddings is often to improve performance on downstream tasks in mathematical AI. For example, Ryskina et al. showed that using OEIS-trained number embeddings significantly improved a model’s ability to complete integer sequences and solve analogy questions compared to using off-the-shelf word embeddings () (). The learned vectors encode information that a model can leverage to guess the next term of a sequence or identify what property a sequence might obey.

Sources: